Spelling Correction in Agglutinative Languages
نویسندگان
چکیده
Spelling correction is an important component of any system for processing text. Agglutinative languages such as Turkish or Finnish, differ from languages like English in the way lexical forms are generated. Typical nominal or a verbal root may generate thousands (or even millions) of valid forms which never appear in the dictionary. For instance, we can give the following (rather exaggerated) example from Turkish: uygarla~tzramayabileceklerimizdenmi~sinizcesine 1 whose morpheme breakdown is: uygar -lag -tzr -area civilized +BECOME CAUS +NEG -yabil -ecek -let -irniz +POT +FUT +3PL +POSS-1SG -den -mi~ -siniz -cesine +ABL +NARR +2PL +AS-IF Methods developed for spelling correction for languages like English (see the review by Kukich (Kukich, 1992)) are not readily applicable to agglutinative languages. This poster presents an approach to spelling correction in agglutinative languages that is based on two-level morphology and a dynamicprogramming based search algorithm. After an overview of our approach, we present results from experiments with spelling correction in Turkish.
منابع مشابه
Spelling Correction: from Two-Level Morphology to Open Source
Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use...
متن کاملSpell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language
There are several native languages in Peru which are mostly agglutinative. These languages are transmitted from generation to generation mainly in oral form, causing different forms of writing across different communities. For this reason, there are recent efforts to standardize the spelling in the written texts, and it would be beneficial to support these tasks with an automatic tool such as a...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملError-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction
This paper presents the notion of error-tolerant recognition with finite-state recognizers along with results from some applications. Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite-state recognizer. Such recognition has applications to error-tolerant morphological processing, spelling corre...
متن کاملA String Similarity Measure Based on Orthographic and Phonetic Similarity for Spelling Correction
The most commonly used string similarity measure for spelling correction is minimum edit distance (MED), which is based solely on the orthographic similarity between two strings. In order to overcome this shortcoming, this paper presents a more sophisticated similarity measure that considers both the orthographic and phonetic similarity between two strings. To demonstrate the effectiveness of t...
متن کامل